Phone Synchronous Decoding with CTC Lattice
نویسندگان
چکیده
Connectionist Temporal Classification (CTC) has recently shown improved efficiency in LVCSR decoding. One popular implementation is to use a CTC model to predict the phone posteriors at each frame which are then used for Viterbi beam search on a modified WFST network. This is still within the traditional frame synchronous decoding framework. In this paper, the peaky posterior property of a CTC model is carefully investigated and it is found that ignoring blank frames will not introduce additional search errors. Based on this phenomenon, a novel phone synchronous decoding framework is proposed. Here, a phone-level CTC lattice is constructed purely using the CTC acoustic model. The resultant CTC lattice is highly compact and removes tremendous search redundancy due to blank frames. Then, the CTC lattice can be composed with the standard WFST to yield the final decoding result. The proposed approach effectively separates the acoustic evidence calculation and the search operation. This not only significantly improves online search efficiency, but also allows flexible acoustic/linguistic resources to be used. Experiments on LVCSR tasks show that phone synchronous decoding can yield an extra 2-3 times speed up compared to the traditional frame synchronous CTC decoding implementation.
منابع مشابه
Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC
Keyword spotting (KWS) aims to detect predefined keywords in continuous speech. Recently, direct deep learning approaches have been used for KWS and achieved great success. However, these approaches mostly assume fixed keyword vocabulary and require significant retraining efforts if new keywords are to be detected. For unrestricted vocabulary, HMM based keywordfiller framework is still the main...
متن کاملLattice Decoding for Joint
A new joint detection method based on sphere packing lattice decoding is presented. The algorithm is suitable for both synchronous and asynchronous multiple access CDMA-DSSS systems, and it may jointly detect up to 32 users with a reasonable complexity. The detection complexity is independent of the modulation size and large M-PAM or M-QAM constellations can be used. Further, a theoretical gain...
متن کاملLattice decoding for joint detection in direct-sequence CDMA systems
A new joint detection method based on sphere packing lattice decoding is presented in this paper. The algorithm is suitable for both synchronous and asynchronous multiple access direct-sequence code-division multiple-access (DS-CDMA) systems, and it may jointly detect up to 64 users with a reasonable complexity. The detection complexity is independent of the modulation size and large -PAM or -Q...
متن کاملMinimum hypothesis phone error as a decoding method for speech recognition
In this paper we show how methods for approximating phone error as normally used for Minimum Phone Error (MPE) discriminative training, can be used instead as a decoding criterion for lattice rescoring. This is an alternative to Confusion Networks (CN) which are commonly used in speech recognition. The standard (Maximum A Posteriori) decoding approach is a Minimum Bayes Risk estimate with respe...
متن کاملLanguage recognition using phone latices
This paper proposes a new phone lattice based method for automatic language recognition from speech data. By using phone lattices some approximations usually made by language identification (LID) systems relying on phonotactic constraints to simplify the training and decoding processes can be avoided. We demonstrate the use of phone lattices both in training and testing significantly improves t...
متن کامل